首页> 外文OA文献 >Optimizing ETL Dataflow Using Shared Caching and Parallelization Methods
【2h】

Optimizing ETL Dataflow Using Shared Caching and Parallelization Methods

机译:使用共享缓存和并行化方法优化ETL数据流

摘要

Extract-Transform-Load (ETL) handles large amount of data and managesworkload through dataflows. ETL dataflows are widely regarded as complex andexpensive operations in terms of time and system resources. In order tominimize the time and the resources required by ETL dataflows, this paperpresents a framework to optimize dataflows using shared cache andparallelization techniques. The framework classifies the components in an ETLdataflow into different categories based on their data operation properties.The framework then partitions the dataflow based on the classification atdifferent granularities. Furthermore, the framework applies optimizationtechniques such as cache re-using, pipelining and multi-threading to thealready-partitioned dataflows. The proposed techniques reduce system memoryfootprint and the frequency of copying data between different components, andalso take full advantage of the computing power of multi-core processors. Theexperimental results show that the proposed optimization framework is 4.7 timesfaster than the ordinary ETL dataflows (without using the proposed optimizationtechniques), and outperforms the similar tool (Kettle).
机译:提取转换加载(ETL)处理大量数据并通过数据流管理工作负载。就时间和系统资源而言,ETL数据流被广泛认为是复杂且昂贵的操作。为了最小化ETL数据流所需的时间和资源,本文提出了一个使用共享缓存和并行化技术优化数据流的框架。框架根据ETL数据流中的组件的数据操作属性将其分类为不同的类别,然后根据分类以不同的粒度对数据流进行分区。此外,该框架将优化技术(例如缓存重用,流水线和多线程)应用于已分区的数据流。所提出的技术减少了系统内存占用以及在不同组件之间复制数据的频率,并且还充分利用了多核处理器的计算能力。实验结果表明,所提出的优化框架比普通ETL数据流快4.7倍(不使用所提出的优化技术),并且性能优于类似工具(Kettle)。

著录项

  • 作者

    Liu, Xiufeng;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号